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Abstract 

The upper bound on the capacity of a 3-node discrete memoryless relay channel is considered, where a source 
X wants to send information to destination Y with the help of a relay Z. Y and Z are independent given X, and 
the link from Z to Y is lossless with rate Rq. A new inequality is introduced to upper-bound the capacity when 
the encoding rate is beyond the capacities of both individual links XY and XZ. It is based on generalization of 
the blowing-up lemma, linking conditional entropy to decoding error, and channel simulation, to the case with side 
information. The achieved upper-bound is strictly better than the weU-known cut-set bound in several cases when 
the latter is Cxy + Ro, with Cxy being the channel capacity between X and Y. One particular case is when the 
channel is statistically degraded, i.e., either F is a statistically degraded version of Z with respect to X, or Z is a 
statistically degraded version of Y with respect to X. Moreover in this case, the bound is shown to be explicitly 
computable. The binary erasure channel is analyzed in detail and evaluated numerically. 

Index Terms 

Network information theory, relay channel, outer bound, channel simulation, blowing-up lemma. Shannon 
theory 

I. Introduction 

The relay channel model was first formulated by Van-der Meulen [1] in 1971, consisting a source X, 
a relay Z, and a destination Y. The relay transmits a signal Xi based on its observation to help Y. 
As a basic building block of general communication networks, it has since then attracted much research 
interests; see e.g. O and references therein. 

A set of achievability results were introduced by Cover and El Gamal [3l. Decode-forward and compress- 
forward are two basic achievability methods. Several capacity results were established for degraded, reverse 
degraded f3\, semi-deterministic [5], and deterministic ^ channels. They are all based on achieving the 
well-known cut-set bound with certain coding scheme; see e.g. Chapter 14 of [|4||. 

In general, however, the cut-set bound seems not tight. A result on this was shown by Zhang in 1988 [[T2ll 
for the channel depicted in Figure 1 . The link from the relay to the destination is assumed to be lossless 
with fixed rate Rq. Y and Z are conditionally independent given X. Furthermore, F is a statistically 
degraded version of Z with respect to X. In other words, X-Z-Y can be re-described as a Markov chain. 
By applying the blowing up lemma [Q, it is shown by contradiction that the cut-set bound cannot be 
tight. However, it is still unknown how loose the bound is. In [[HI, a specific class of modulo additive 
noise relay channels is considered. The relay observes a noisy version of the noise corrupting the signal 
at the destination. The capacity is established and shown to be strictly lower than the cut-set bound. To 
the best knowledge of the author, there is no general upper-bound tighter than the cut-set bound for the 
relay channel. 

In this paper, we consider improving the cut-set bound for the channel depicted in Figure 1, similar 
to [[T2|. Nodes Y and Z are independent given X, and the link from the relay Z to the destination Y is 
lossless with rate Rq. Specifically, in a transmission of n channel uses, a "color" in {1, 2, ■ ■ ■ , 2'^^'^} can 
be sent to Y without error. 
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Fig. 1. Relay Network with lossless relay-destination link 



The cut- set bound for this relay channel is 

maxmm{I{X;Y) + Ro,I{X;Y,Z)}. (1) 

p{x) 

It equals Cxy + Ro in many cases when Rq is small, where Cxy denotes the channel capacity between 
X and Y. This is based on the following observation. Suppose under input distribution p*{x), I{X;Y) 
becomes Cxy- Then as long as I{X;Y,Z) > I{X;Y) under p*{x), the cut-set bound is Cxy + Ro 
whenever _Ro is such that Cxy + Ro < Y, Z). 

In this paper, a new bounding technique is introduced. It leads to an explicit and strictly-better upper 
bound than Cxy + -Ro when i?o > and the encoding rate is beyond both Cxy and Cxz- We present 
the following results specifically. First, we show an explicitly computable bound for the case when the 
channel is statistically degraded. That is, Z is a statistically degraded version of Y with respect to X, or Y 
is a statistically degraded version of Z. The bound is strictly lower than Cxy + Rq^ and thereby improves 
the result in [12J directly. As an example, the binary erasure channel is analyzed in detail. Secondly, by 
extending the method of channel simulation[fT4|. |l5j, we generalize the results to cases when the channel 
is not necessarily degraded. 

The essential idea of this bounding technique is to introduce a fundamentally new inequality on any 
feasible rate, in addition to Fano's inequality [4J. In our case, Fano's inequality manifests as i? < Cxy + 
-Ro — }{H{Z'^\X"'), where denotes the color the relay sends to Y and X" is the codeword. Our new 
inequality is established by combining two observations for any feasible code: 

• First, it is known that the decoding probability for a memoryless channel decays exponentially when 
the encoding rate is beyond capacity. This is universal and independent of the encoding/decoding 
technique. Moreover, the exponent is explicitly computable [9]. 

• Secondly, any feasible rate and associated encoding/decoding scheme provide a way for node[^ Y to 
guess the codeword X" based solely on its own signal F", as follows. Since the rate is feasible, 
there must be a decoding function which maps y" and the color to the correct codeword. So 
node Y only needs to guess the color Z". To accomplish this, one notices that, when ^if is 
close to zero, the color turns to be a deterministic function of X", even though the XZ channel 
is random. So if Y can generate a random variable with the same distribution of given X", 
there is a good chance for Y to guess the "color". This guessing is achieved by generalizing the 
blowing-up lemma. Overall, the probability of successful decoding can be determined. 

Based on the first observation, the probability of success in the second observation must be less than the 
universal bound, and thereby the second inequality establishes. With Fano's and our inequalities at hands, 
it will be clear that the second inequality becomes active when ^iJ(Z"|X") is small, and it bounds the 
rate away from the cut-set bound. 

One critical step in our method is to generate a random variable with the same distribution of Z'^ (or 
F") given X". When the channel is statistically degraded, this task is straightforward. In more general 



'We will also consider the case where node Z makes the guess. 
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cases, method based on channel simulation [[T4|. [fTSl can be applied. Channel simulation [[T4|. [[TSl aims 
generating random variable in an "efficient" way. In our case, where a side information is available (e.g. 

when Y needs to "simulate" Z"), a generalization of the known results is derived and applied for 
achieving the new inequahty. 

The rest of the paper is organized as follows. Section II introduces the basic definitions, notations and 
a well-known bound on decoding probability when encoding rate is beyond channel capacity. Section III 
generalizes the blowing up lemma and links to conditional entropy. Section IV applies it to characterize the 
bound for the case when Y and Z are i.i.d. given X; it also takes the binary erasure channel for detailed 
derivation. Section V subsequently generalizes the results to the case when the channel is statistically 
degraded. Section VI presents the channel simulation and generalizes it to the cases when side information 
is available. This is later applied to our relay channel in Section VII to achieve a general bound. Finally, 
Section VIII concludes with some remarks. 

II. Definitions, notations and a well-known bound on decoding probability 

The memoryless relay channel we consider consists of three nodes, sender X, relay Z and desti- 
nation Y, defined by the conditional distribution p{y,z\x). Y and Z are independent given X, i.e., 
p{y,z\x) = p{y\x)p(z\x) . The values of X, Y and Z are from finite spaces Qx, and Qz respectively. 
Correspondingly, for a transmission of length n, the code word a;" is chosen from Q,x, the product space 
of fix, and the received observations are G fiy and z"' G fi^, respectively. The link from the relay to 
the destination is a lossless link with rate Rq. Namely, for a transmission of n channel uses, a number 
from {1, 2, ■ ■ ■ , 2"^"} can be sent to Y without error. 

A coding strategy of rate R for n channel uses is defined by a 3-tuple (C^'^^ 5'„(2;"), y")). Set 
(j(n) ._ |2;"(m),m = 1, - ■■ ,2"^} is the code book at the source X. Node X chooses one codeword 
uniformly from the set and transmits to the channel. Function gni^"') is the encoding function at the relay 
Z, which is a function mapping an observation z" to z"\ which is a "color" j in {1,2, ■■■ ,2"^"}. In 
this paper, we use z"- to denote this mapping function, and call the set {1,2, ■■■ ,2"^°} the color set. 
Function fn{z"-,y^) is the decoding function at the destination Y, mapping the color from the relay and 
the observation ?/" to a code word in C*^"^. All C*^"\ and /„(•) are well-known at all nodes. 

Definition 1. Rate R is feasible if there exists a sequence of coding strategies of rate R, 

{(C("\ (^^(z"), y")), n > 1}, such that the successful decoding probability approaches one as n 

goes to infinity. That is, 

limPr(/„(Z'^,y") = X") = 1. 

n 

We introduce several notations here. 

• CxY and Cxz are the channel capacities from the channels X-Y and X-Z, respectively. 

• The notation dH{xi,X2) denotes the Hamming distance of two points. 

• Throughout the paper, log is with base 2. Also, we reserve the use of the hat symbol uj on top of a 
random variable solely for the coloring. 

We now quote the result on decoding probability when transmitting at rate above a channel's capacity. 

A. Decoding Probability When Based on Only 

Consider only the transmission between X and Y, and ignore Z. That is, the destination Y wants to 
decode the codeword by using only. When the code book has rate above the capacity, it is well-known 
that the decoding probability approaches zero exponentially fast. The following is shown in [l9l. 

Theorem 1. Suppose that a discrete memoryless channel with an input alphabet ofK letters {ai, ■ ■ ■ , a^} 
and an output alphabet of J letters {hi, ■ ■ ■ is described by transition probabilities Pjk = p{bj\ak). 
Then, for any block length n and any code book of size M = 2"^, the probability of decoding satisfies 

Pr{Decoding) < 2-"(-^'«+"^i°p Vp G [-1,0), (2) 
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where p represents a distribution over the input alphabet {pk], and 

' J ( K -) 



<l>o(p,p) := - log ^ 

'_i=i U=i 

In the paper, we denote the largest exponent as 



S{R) := max (-pi? + min $o(p,p)) (3) 
pe[-i, 0) p 

Remark 1. ([9]) It is easy to show that £{R) > for any given R > Cxy- Also note that 

lim - min $o(P)P) = 1™ - 'mm^Q{p,p) = Cxy- 

p-5>0- p P p-s>0+ p P 

III. Generalizing the Blowing-Up Lemma 

The well-known blowing-up lemma [|3, [fTOl states that if an event A^^'^ in a product probability space 
has probability diminishing slower than exponential, then the event consisting all points that are within 
a small Hamming distance of A''"'^ will have a probabihty going to one. More precisely, it is the following. 

Lemma 1. (The Blowing up Lemma) Let Qi,Q2, - ■ ■ ,Qn be independent random variables in a finite 
space Q, with distribution Pq^ respectively. Denote random vector := (Qi, ■ ■ ■ ,Qn) and the joint 
distribution Pq-n ■= II^^^Pq-. Suppose there exist e„ — )■ and event A^'^^ E such that Pr(Q'^ E 
^W) > 2""'". Then there exist 6n,r]n going to such that Pr{Q"- E r„5„(^("))) > 1 - ?7„, where 
Ti{A^'^'^) := {x" : miny„g_4(„) dnix"^ ■,y'^) < 1} is the "blown-up" set. 

This lemma can be generalized to the case without requirement on the event probability as follows. 

Lemma 2. Let Qi,Q2,--- ,Qn be independent random variables in a finite space Q, with distribution 
Pq- respectively. Denote random vector := (Qi, ■ ■ ■ ,Qn) and the joint distribution Pqn := II^^^Pq^. 
Suppose that event E is such that Pr {Q'' E y^^")) > 2-"^=" for > 0. Then for any A > 1, 

e r„,^(^("))) > 1 - i/A. 



Proof: The proof follows Marlon's proof [10] and the summary in El Gamal's slides n\M . Please see 
the details in Appendix |Aj □ 

As can be seen, the above two lemmas consider how large (in Hamming distance) one should blow-up 
an event set so that the larger set has a non-trivial probability. 

Similar result is needed for the relay channel we consider. Recall that node X sends a code word X" 
uniformly picked from its code book C^^\ This generates an observation at node Z, which has a 
'color' Z". As will be shown later, the conditional entropy is a key parameter in bounding 

the feasible rate away from the cut-set bound. Given H{Z"-\X"-) = nan, we show that there exist a 
Hamming distance determined by a„, a non-trivial set of codewords C[^^ C C*^"^ , and a set of special 
colors associated with each such codeword satisfying the following. If a codeword x" from C['^^ is sent, 
then for each special color j of x", blow up the set of z'^'s of color j by the distance specified. Then this 
new set has a non-trivial probability. Specifically we have the following. 



Theorem 2. Assume that iJ(Z"|X") = na„. Then for any given A > 1, there exists a set of codewords 
Cj"'' satisfying the following: 
. Pr(X" E Ct^) > 1 - 1/A; 
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• For each code word in there is a set of colors S'(x") C {1, ■ ■ ■ , 2"^"} such that Pr{Z^' G 
S{x"')\X^ = x") > 1 — 1/A. Furthermore, for each j of S{x^), we have 

Pr (Z" G r„,3/.^(^f > 1 - 1/A > 0, 

where ^J"^ := {z" G fi^ : z"" = j}. 
Proof: Please see in Appendix |Aj □ 

IV. Upperbound when Y and Z are conditionally I.I.D. given X 

In this section, we consider the case when Y and Z are conditionally i.i.d. given X. That is, f2y = 
^Iz ■= ^, and for all cu E Q and x G ^Ix, Pr{Y = lo\X = x) equals Pr{Z = uj\X = x). Two inequalities 
on any feasible rate are introduced, both taking if(Z"|X") as parameter. 

The first one is Fano's inequality as follows. 

Lemma 3. [Fano's Inequality] Denote H{Z"-\X"') = nan- For any feasible rate R, we have R < Cxy + 
Rq — + o(l), as n ^ CO. 

Proof: Since the code book is feasible, we have H{X^) = nR and, by Fano's lemma [^, 

i7(X"|F",i") = n-o{l). So 

n{R + o(l)) = /(X"; F", Z") = J(X"; Y"") + - 

< nCxY + uRq - /7(Z"|X"). (4) 

□ 

Now we introduce the following definition. 

Definition 2. A ball of radius r centered at a point x^ in a space fi" is denoted as BaUx^{r), and is 
defined as the set of points in Q"' that is within Hamming distance r of x'q. When r is not an integer, 
the minimum integer no less than r is used instead. In the paper, we often use Ball{r) when there is no 
confusion. 

The following is true on the volume of a ball - the number of points enclosed. 

Remark 2. For fixed constant p G [0, 1], we have \Ball{np)\ = (^'^j By Lemma 17.5.1 in the 2006 
edition of [^, we have ^ log \BaU{np)\ = plog + H2{p) + o(l), where the o(l) is only a function of 
n, and H2{p) is the binary entropy function — plogp — (1 — p) log(l — p). 

The second inequality is the following. It hinges on the fact that any decoding strategy solely based on 
y" is subject to the inequality in Theorem [T] While, given a feasible strategy, upon which a procedure 
for node Y to guess X" can be derived. 

Theorem 3. Assume that Y and Z are i.i.d. given X, and if(Z"|X") = na„. Also assume that rate 
R > Cxy is achievable. Then for all A > 1, there exist 6n going to zero, determined by n and A only, 
and integer Ni, such that for n > Ni, 

^log|5a//(nA^/^v^)| + 5„ > S{R), 

where £{R) is defined in ^ for the XY channel. 

Proof: We present the main ideas here. The detailed proof is in Appendix |bJ 

By definition, for a feasible coding strategy, there associates a decoding function, at node 

Y which correctly maps (Z", F") to the codeword almost surely. So to construct a decoding strategy for 
node Y to be depending on F" only, one natural way is to let Y guess the color and then apply 

/n(-,-)- 
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The following strategy is proposed. Node Y paints every point in the same color node Z would 
paint, namely 5". Once receiving F", node Y draws a Hamming ball of radius nX'^/'^^/ci^ around in 
QT". Then it randomly and uniformly picks a point in the ball and finds its color as a guess on Z". 

We now show that the probability of guessing Z'"- correctly this way is about order ci \BaU[nXA/'^^) \ ' 
with ci > being constant. Note that if this is true, then the theorem is immediate by applying Theorem 

m 

Actually, by Theorem [2| for a probability pi > 0, Z"'s color Z" is from a special color set S{X"-) 
of the transmitted codeword X". For each such special color, say j, blowing up all the points in of 
color j by Hamming distance nX^^'^^/a^ results in a set with the following property. If one generates a 
random variable based on Z"'s distribution given X", then this random variable will be in this set with 
probability no less than pi. Since F" is such a random variable, F" is within distance nX^/'^^/a^ of a 
^"-colored point with probability no less than p\. Thus overall, the probability that our strategy guesses 
correctly - equivalent to guessing X" - is no less than ci [^^/^(nAS/^v^)! ' ^^^^ Ci > being a function 
of pi. □ 

Combining Lemma [3] and Theorem [3] gives the following main theorem. 

Theorem 4. Assume that Y and Z are i.i.d. given X. Then there exists a G [0, Ro\ such that any feasible 
rate R larger than Cxy satisfies: R — Cxy < -Rq ~ and S{R) < H2[\/~a) + a/a log \VL\. 

Proof: Assume that /f(Z"|X")/n = a„. From Lemmajs} Theoremjijand Remark[2| we know R—Cxv < 
Ro-an + o(l) and £{R) < H2{X'^'^^) + X^^'^ ^/<hl\og \ Vt\ + o{l). Suppose limsupa„ = a, which exists 
because a„ is finite in [0,i?o]- Then R — Cxy < Rq — a and £{R) < H2{X^^'^^/a) + A^/^i/a log 
Because this is valid for any A > 1, we know the theorem is true. □ 

The following is immediate by Remark [I] and that H2{^/a) + ^/a\og is continuous in a and is zero 
at a = 0. 

Corollary 1. When Y and Z are i.i.d. given X, and R > Cxy is feasible, then R is strictly less than 
Cxy + Rq- 

Now we take the binary erasure channel (BEC) as an example for detailed analysis. 

Example: Detailed Analysis on the BEC. Suppose both XF and XZ are conditionally i.i.d. binary 
erasure channels with erasure probability e, as defined by Pr{y = x\x) = 1 — e, Pr(y = erasure\x) = 
e, Wx e {0, 1}. 

The corresponding £{R) can be determined as follows. The detailed derivation is in Appendix |b} 



^^"^^"t i?-log(2-e), R>l-2^,. 

With this. Theorem |4] can be applied to find the bound numerically on the achievable rate for any given 
Rq. The following is a plot for the case when e = 0.5. The bound is nevertheless very close to the cut-set 
bound. 



V. When the Channel is Statistically Degraded 

In this section, we extend the result in the previous section to the case when the channel is statistically 
degraded. We say Z is a statistically degraded version of F with respect to X if there exists a transition 
probability distribution qi{z\y) such that p{z\x) = J2y9ii^\y)piy\^)- Accordingly we say that channel 
XYZ is degraded. Similarly, F is a statistically degraded version of Z with respect to X if there exists 
a probability distribution q2{y\z) such that p{y\x) = '^z1'2iy\^)Pi^\^)- ^^^^ case, channel XZY is 
degraded. Note that [il2l considers the case when XZY is statistically degraded. 
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Fig. 2. Numerical result on the BEC channel when Pr{erasure) = 0.5. Note that Cxy = Cxz = 0.5, while the capacity between X 
and {Y, Z) is 1 - 0.5^ = 0.75. 



A. When XYZ is Statistically Degraded 

The following procedure can be employed by Y to decode X" solely based on observation F". At i-th 
transmission, upon receiving an observation Yi, il^ generates a random variable Zi based on the transition 
probability qi{z\y); thus for the observed F", a is generated. Now consider the relay channel formed 
by X, Z, and Z; see Figure [s] It is obvious that Z and Z are i.i.d. given X. The same procedure in 
Section IV, namely the method for Z (it is actually node Y) to guess and the derivation on the 
decoding probability, can be applied. This leads to the following, similar to Theorems [3] and |4j 




Fig. 3. Augmented Network when XYZ is degraded. Z is generated based on qi{z\y). 
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Theorem 5. Suppose that XYZ is statistically degraded. Denote /f(Z"|X") = na„. Then for all A > 1, 
there exists 5„ — )■ 0, determined by n and A only, such that ^ log iBall^^^nX^/"^ ^/a^) \ + 5„ > £Y{R),for 
R > CxY- Here £y{R) is as defined in ([5]) for the XY channel. 

Theorem 6. Suppose that XYZ is statistically degraded. Then there exists a G [0,-Ro] such that any 
achievable rate R larger than Cxy satisfies: R — Cxy ^ Rq — cl and £y{R) < H2{y/a) + a/a log \ VLz\. 

B. When XZY is Statistically Degraded 

The upper bound for this case can be derived by considering the decoding probability when node Z 
tries to decode X" solely based on as follows. 




Fig. 4. Augmented Network when XZY is degraded. Node Z now tries to decode X" solely based on Z^ . Y is generated based on 
q2{y\z). Z is a random variable with the same distribution of Z given X. 

Build a new channel based on the relay channel XYZ as depicted in Figure |4j First, add a new random 
variable Z which is independent of others given X and has the same distribution as Z given X. Then 
add another random variable Y based on Z as follows. Whenever Z is received, node Z generates Y 
according to q2{y\z). Thus we have a new channel XZYZ. Finally add a lossless link of rate Rq from Z 
to Y. Since the channels XYZ and XYZ are equivalent statistically, any rate achievable by the XYZ 
channel must be achievable by the channel XZYZ. Here {Z, Y) is considered as one single node. To see 

this, given observation Z", node Z maps it to a color based on the same mapping from to Z". 

For any feasible coding strategy, node Z invokes the associated decoding function fn{Z",Y"-) to decode 
X". 

Now consider the channel XZYZ. Node Z can guess X" based solely on by the following 

procedure. Assume = nan, and fix a constant A > 1. Node Z draws a ball of radius nX^/'^^/a^ 

around Z". Because Z and Z are i.i.d. given X, as shown in the proof for Theorem 4 the color is 
contained in the ball with non-diminishing probability. Randomly pick a point cu" in the ball, node Z 
announces fn{^^\ Y^) as the code word. By similar argument in the previous section, the following is 
true. 

Theorem 7. Assume XZY is statistically degraded. Denote H{Z'^\X^) = nan. Then for all A > 1, there 
exists 6n going to zero, determined by n and A only, such that 

- log \BaUnAnX^^^V^)\ + Sn> £z{R). 
n 

Theorem 8. Assume XZY is statistically degraded. Then there exists a G [0, Rq] such that any achievable 
rate R larger than Cxz satisfies: R — Cxy Rq — cl and £z{,R) < H2{y/a) + a/a log \VLz\. 
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VI. Channel simulation with side information 

In the previous two sections, the new inequality is based on decoding error probability. The key step 
is for a node (e.g. Y) to guess the color of another node's observation (e.g. Z") by generating a random 
variable with the same distribution given X". This is readily doable when the channel is statistically 
degraded. For general cases, one needs new method. In this and the next sections, we show that this can 
be done by generalized results from channel simulation. To the best knowledge of the author, this is the 
first time channel simulation is applied in analyzing the relay channel capacity. For a clear presentation, 
we first introduce channel simulation and generalize a basic result in this section suitable for our purpose. 
In the next section, the result will be applied to bound the relay channel capacity. 

A. Channel Simulation and its Adaptation for the Relay Channel Considered 

Channel simulation (CS-Basic). In its original formulation, channel simulation (e.g. [fT4ll [fTSlD con- 
cerns the following problem in general. Suppose there is a source t/", randomly generated according to 
distribution and a channel defined by p(t>"|u"); see Figure [5} The channel output is denoted as 

with distribution Then the task of channel simulation is to efficiently design a f/" with certain 

cardinality and an associated distribution p{u^) such that, when one inputs the channel based on p{u^), 
the induced output distribution p{y^) is close to p(t>") in the sense that 

d{y-,V^) :=^|p(^;«)-p(^")|^0. 

The optimization focuses on minimizing the cardinality of the support of p(tt"). Note that d{V^^V^) also 
equals max^ \\Pr{V'' e A) - Pr{V'' G A)\. 



p(v"|u") 



V-piy'') 



p(y"|u") 



Fig. 5. Channel simulation in its original formulation (CS-Basic). Top: The original channel to be simulated; Bottom: Simulated channel. 

For bounding the capacity of the relay channel under consideration, we adapt the channel simulation 
formulation to the following. 

Channel Simulation with Side Information and Common Randomness (CS-SICR). The following 
channel is considered, as shown in Figure |6j The source node X produces X", which is generated from 
a code book C'^"-' = {ci, ■ ■ ■ ,cm} with probability distribution p(X" = Cj) = for all j. The channel 
output is Z". Moreover, there is also a random variable F" as side information. The channel is defined 
by p{y", z"'\x"'), and the random variables have a joint distribution z"'). 

The channel simulation procedure is as follows. A "channel encoder" sees the source X", side informa- 
tion Y"-, as well as a "common" random variable K which is uniformly distributed on {1,2, ■ ■ ■ ,2^^^}, 
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where R2 is a constant. It determines a (simulation) code word f/ G {1, 2, ■ ■ ■ , 2"^^} based on an encoding 
function y", k), which is a probability distribution on {1, 2, ■ ■ • , 2"^^ }. There is a "channel decoder" 

which also observes F" and K. Upon receiving t/, it will generate an output random variable based on 
a function ^niu,y"', k). Suppose the joint distribution among the random variables is q(x"',y", z"',u,k). 
The objective of the channel simulation is to design -, •) and t/Jni', ■) such that 

where Q{z'^\x"^,y"') is the conditional distribution induced from the joint distribution q{x'^,y'^, z'^jUjk). 





p(y",z"|x") 











p(y",z"|x") 



»U--->i^„(u,y",/c)->Z" 



K 



Fig. 6. Channel simulation witii side information and common randomness (CS-SICR). Top: The channel to be simulated. Bottom: Given 
X", y" and K, channel encoder applies <^„ to generate code U\ channel decoder applies ?/)„ to generate Z", which simulates Z". 



Remark 3. Note that the above formulation is based on ^TS^ . Compared to IU3\l . there are two differences. 
First, the source X" is not generated from i.i.d. random variables Xi, X2, ■ ■ ■ , X„ based on a distribution 
p{x). Instead, here the source is uniformly picked from a code book. Secondly, there is a side information 
y" in our formulation. 

B. Why CS-SICR Can Be Used for Bounding the Capacity of the Relay Channel 

Before going to deriving results for the special channel simulation, we first briefly explain why the 
seemingly irrelevant channel simulation can be applied towards bounding the capacity of the relay channel. 
Suppose such simulation procedure has been established by designing ■, ■) and ipn{-, ■)• Then if U 
were given, node Y in our relay channel would be able to use ^/'„(u, y", k) to generate a random variable 
of the same distributiorj^ of given X". This is because Y knows F" and the common randomness K. 
Thereafter, one follows the procedures in the previous sections for node Y to guess the code word X" 
and thus leads to the new inequality. However, here U is an unknown element in {1, 2, ■ ■ ■ , 2"^^}. Thus 
our new guessing strategy starts by first picking a random element in {1, 2, • • • , 2"^i} as a guess on U. 

Based on this thinking, the optimization on the channel simulation is to minimize Ri. 

C. Results on Channel Simulation 

A few definitions need to be introduced. 

^The exact meaning will be made clear later. 
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Definition 3. For a pair of random variables f/" and V"' with joint distribution p{u'^, f the point mutual 
information is defined as the random variable log Note that I{U''; = Ei{U''] V'% 

Similarly, when there exists another random variable F", define conditional point mutual information 

^{U-■V-\Y-) as log ^p^ip. 

Definition 4. The limsup in probability of a sequence of random variables {T„} is the smallest (5 such 
that for all e > 0, lim„ Pr{Tn > (3 + e) = 0. The liminf in probability of {T„} is the largest a such that 
for all e > 0, lim„ Pr(Tn < a — e) = 0. 

A basic result of channel simulation is the following lemma. It shows that for the basic channel 
simulation (CS -Basic), the limsup of the average point mutual information is the rate required. 

Lemma 4. [Section IV around Equation (4.1) in [fT4l| ^ In CS-Basic, assume that random variables f/" 
and V"' have joint distribution and marginal distributions p{u"', v'^), p{u"') and p(f "), respectively. Define 
I{U;V) := limsupMogS^lj^j^ (in probability). For a given 7 > 0, generate M = 2"^"(^'^)+"t i.i.d. 

random variables U", j = 1, ■ ■ ■ , M, according to p{u^). Assume f/" = Cj, j = 1, ■ ■ ■ , M. Define an 

associated distribution -Py"[ci,--- .cj,/]!"^") '■— i/ Sjii Then limn Ed(y^,V'^[ci, ■ ■ ■ ,cm]) = 0. 

For channel simulation CS-SICR, not surprisingly, the limsup of the average conditional point mutual 
information is the rate needed. 

Theorem 9. Consider the channel simulation problem CS-SICR with side information Y"- and common 
randomness K. For any 5 > 0, there exist channel simulation encoding (f)n{x^ , , k) cind decoding 
ipniUjU^, k) with rate Ri = limsupz(X"; Z"|F")/n + 6 and R2 sufficiently large such that 

J2 |p(a:^y^^")-p(a;^y")Q(^"|x^l/")|^0, n^oo, (6) 

^y"^ ^z^ 

where Q{z"'\x"',y"') is the conditional distribution induced from the joint distribution q{x'^,y"',z^,u,k). 
Proof: The theorem is a generalization to result in [11511 . The details can be found in Appendix |C] □ 

VII. A GENERAL UPPER BOUND BASED ON CHANNEL SIMULATION WITH SIDE INFORMATION 

Given the preparation in the previous section, we are now ready to present a general upper bound for the 
relay channel under consideration. We use "the relay channel" to refer to the channel we are considering 
in Figure [T] and defined in Section II. 

First we introduce a companion channel to the relay channel. 

Definition 5. Suppose the relay channel in Figure 1 is defined by a conditional distribution p{y.,z\x) = 
p{y\x)p{z\x). And C*^"^ := {ci, ■ ■ ■ ,cm} is a code book for a feasible coding strategy. A companion 
(simulated) channel is a memoryless channel defined by a conditional distribution p{y, z\x) which satisfies 
the following: 

i) For all x, y and z, p{y\x) = p{y\x), and p{z\x) = p{z\x). That is, the marginals are the same. 

ii) The input distribution is p{X^ = c) = 1/M, for all c E C'^"^ 

Furthermore, we use Cxy (^nd Cxyz to denote the capacities between X-Y and X-{Y, Z), respectively, 
in this companion channel. 

One notices immediately that Cxy = Cxy- 

Remark 4. For the relay channel, if there exists qi{z\y) such that p{z\x) = J2y 1ii^\y)piy\^)' '^^^ 
choose p{ii,z\x) = p{y\x)qi{z\y). In the case when Y and Z are i.i.d. given X, it leads to p{y,z\x) = 

P{y\x) ■ l[z=y]. 

^The concept was first introduced in 1131 . 
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For the companion channel, we have the following two lemmas relating point mutual information to 
channel capacity. 

Lemma 5. Tfliminf F") = Cxy — ci, with ci > 0, then 

limsup < Cxvz - Cxy + Ci. 

n 

Proof: Since the channel is memoryless, finite and discrete, the communication channel between X and 
(Y, Z) satisfies the strong converse property. By Lemma 10 in [|14il . we know that lim sup -i(X"; F", Z") < 
CxYZ ■ Since i(X"; F", Z") = i(X"; F") + i(X"; Z"|F"), the conclusion is obvious. " □ 

Lemma 6. Suppose the code book size is 2"^ in the relay channel, //"liminf ^z(X"; F") > — ci /or 
Ci > 0, then there exists C2 > anJ Uk going to infinity such that: 

i) limfc ^-/(X"^ F«^) < Cxy ~ c^; and 

ii) C2 /5 positive if ci zi' positive. 

Proof: Please see in Appendix |D] □ 
Now we present the following main result which is a generalization to Theorem [6] in Section V. 

Theorem 10. Suppose code book C'-"^ of rate R is feasible for the relay channel, and liminf F")/n = 
Cxy — Ci. Then for any companion channel p{y, z\x), there exist constants C2 > and a > such that: 

i) R< Cxy - C2 + Ro - a; 

ii) £y{R) < CxYz - Cxy + ci + H^^^) + v^log \nz\; 

iii) C2 is positive when c\ > 0, as identified in Lemma |^ 

Proof: We scatch the main ideas here. The detailed proof is in Appendix |D} 

i) and iii) are due to Fano's lemma and Lemma [6] as follows. Denote a„ := iJ(Z"|X")/n for the relay 
channel. For any n, we know H{X^) = nR and, by Fano's lemma, iJ(X"|F", Z"^) = n ■ o(l). Similarly 
as before, this leads to 

R < /(X"; F")/n + i?o - + o{l). 



Thus, since liminf i(X"; F")/?2 = Cxy — Ci, by Lemma |6} we know there exits — )■ oo such that 
R < Cxy — C2 + Rq — a^^^ + o(l). Denoting a := lim sup a„^, we have R < Cxy — C2 + Rq — a. This 
gives i) and iii). 

ii) can be shown by applying channel simulation results for the companion channel p{y, z\x). 

By Lemma [s] and Theorem |9} for any 6 > 0, with rate -Ri := Cxyz — Cxy + ci + 6, one can simulate 
the channel p{x^,y^,z^) based on side information F" and a common randomness K. This involves 
constructing (pnix^ , y" , k) and il)n{u,y^\k). In the relay channel, node F can utilize this to produce 
a with distribution close to that of the relay's observation as follows. To generate a channel 
simulation output based on - , ■), it needs U , K, and F". It has and K. For U, there are total 
2nRi possibilities. Node F picks an element U uniformly in {1,2, ■ ■ ■ ,2"^i} as U, and generates a Z" 
based on tjjn{U,Y"', K). Note that the probability to hit the correct one, i.e. U = U, is at least 2^"^^. 

Given Z", node F can apply the same procedure and argument as in Section IV to guess X". 
Specifically, it draws a ball of radius n}?l'^^fa^ around in the space fi^, for a constant A > L 
Then it picks a point uniformly in the ball and applies the known decoding function /„(ci)",F") to 
guess X", where a)" is the 'color' of w". 

Now we analyze the decoding probability of the above procedure. The decoding would be successful 
if both the following conditions are true. First, node F chooses the correct IJ to simulate the correct Z", 
i.e. U = U . Second, given a correct Z", node F hits the correct color in the ball of radius nX^/'^^/a^ 
around Z". We hence have 

Pr (Node F can decode correctly) > 2""^^ 



5a//(nA3/2^)|' 
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where /ii > is a constant. 

Based on the resuk of Arimoto f9), one must have 

£y(R) <Ri + hmsup - log lEa/ZfraA^/^v^) I . 

n ' ' 

Plugging in the bound on the ball's volume as in Remark [2| the above inequality leads to the desired 
claim ii). □ 



A. Discussion on Theorem U0\ 

When XYZ is statistically degraded, one can choose the companion channel such that Cxyz = Cxy, 
and ci = C2 = 0. To be more specific, one can make XYZ to be a Markov chain. This shows that the 
bound, when XYZ is degradecQ is a special case of Theorem 10 



There are certainly cases where Cxyz > Cxy no matter how one chooses the companion channel. In 
these cases, by purely looking at Theorem 10, one can choose a = and Ci = C2 = without violating 
either i) or ii) for R slightly larger than Cxy (i-e. Rq is close to zero). The effective bound becomes i), 
which is the same as the cut-set bound in this regime. When Rq gets larger, the inequality in ii) becomes 
the effective bound. At this moment, our new bound deviates from the cut-set bound, and is strictly better. 



VIIL Concluding Remarks 

The paper presents a new technique for upper-bounding the capacity of the relay channel. Bound strictly 
better than the cut-set bound is achieved. One of the essential ideas is to let one node simulate the other 
node's observation. 

However, requiring a lossless link between the relay and the destination makes it quite different than the 
original relay channel in [JJ . It remains unclear how fundamental this requirement is to the new bounding 
method. 

Interestingly, it is in general possible that the cut-set bound is tight even when the encoding rate is 
larger than the capacities of both XY and XZ channels. For example, consider the following deterministic 
relay channej^ 

. rix = {1,2,3,4}, Vly = {'A','B'}, and Viz = {'C','D'}; 

. Y =' A', for all X G {1, 2}; r =' B' for all X e {3, 4}; 

. Z =' C, for all X G {1, 3}; Z =' D' for all X G {2, 4}; 

• There is a lossless link of rate Rq from Z to Y . 
Note that Cxy = Cxz = 1- For this channel, the following strategy can send 1 + Rq bit per channel 
use from X to Y when _Ro < 1- First, construct a code book Ci of rate 1 based on hypothetical symbols 
{a, b}. Denote it as Ci := {a"(wi) : Wi = 1, ■ ■ ■ , 2"}. Then construct a code book C2 of rate Rq based 
on hypothetical symbols {c,d}. Denote it as C2 := {/3"(w2) : W2 = 1, ■ • ■ ,2"^°}. To send a message 
(^1,^2), node X compares a"(wi) and /3"(w2), and produces a codeword as follows. For each position 




1, if a^(wi) = a,/3^(w2) = c; 

2, if a;^(wi) = a,PJ^{w2) = d; 

3, ifaUwi) = b,/3Uw2) = c; 

4, ifaUw,) = b,PU^2) = d; 



It is easy to check that Wi can be decoded by node Y and W2 by node Z. Then node Z can forward this 
message to Y. 



''a similar result to Theorem 10 can be derived when node Z simulates F". This will include the case when XZY is statistically degraded. 
^This can be considered as a special case of [6} with specific code design. 
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Appendix A 

A. Proof for Lemma |2] 

The proof follows Marton's proof [,101 and the summary in El Gamal's slides [[T6l . 

The following lemma is from [fTOl . Recall that the KL-divergence between two distributions Pi and P2 

is defined as D(Pi||P2) := Ei^i(01og(^i 

Lemma 1 of [10]: Let := (Qi, Q2, ■ ■ ■ , Qn) and := (Qi, Q2-, ■ ■ ■ , Qn) be two series of random 
variables defined in Let Qi,--- ,Qn be independent, each with distribution Pq. respectively. 

Denote Q"''s joint distribution as Pq™ = HiLi ^Qi Q^'^ distribution as Pgn. Then, there exists a joint 
probability distribution Pgn qn with these given marginals such that 

1 /l / " 



i=l 



Now define 

Pnnixn 



Pq„I^(„)(x") = PQ.(a;-)/PQn(^(")), Va:" G 
0, otherwise. 



Then, D{PQ„\\YltiPQJ = -logPQn(^W) < nc„. 

By the lemma, we know there exists a joint distribution such that EdniQ"", Q^) ^ 
By the Markov inequality, for any 5 > 0, 



0'. 



n 



If we choose 5„ = X^/c^. with A > 1, we therefore have 



PQ.(r„,„(^("))) = PQ„,Q„(r„5„(^(")) X ^(")) + PQ„,Q„(r„,„(^(")) x ^W'^) 

= PQ„,Q„(r„5„(^("))x^H) (7) 

= PQ„,Q„(rfH(Q",Q") <^5n) (8) 

> 1 - i/A, 
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where A^"'^'^ is the compliment set of and (jv]) and ^ follow from the fact that PQn Qn(x",x") = 
when ^ ' □ 

B. Proof for Theorem |2]' Connection between Conditional Entropy and the Blowing-up Lemma 
We need two auxiliary lemmas. Recall that is the "coloring" function on at node Z. 

Lemma 7. Suppose = na„ with an > 0. Then for all fi > I, 

Pr (^X" e ja;" : i7(Z"|X" = x") < n/ia„|) > 1 - > 0. 
Proof: Define A„ := {x" : = x") < ra/ia„}. If Pr(A„) < 1 - then we have 

i7(Z"|X") = ^Pr(x")i/(Z"|X" = x") > J2 P^(^'')^l^(^n > -wan 



This is a contradiction. □ 
This lemma shows that for fixed /i > 1, there exist a constant Po > and a set of codewords An E C*^"^ 

such that: Pr(X" G A„) > > 0, and = x") < /ina„ for all x" G A„. 

The following lemma characterizes the colors which have "significant weight". 

Lemma 8. Suppose {pj,j = 1, ■ ■ ■ ^ 2"^"} a probability distribution and Yl'j=i ~Pj logPj ^ '^^n- 7"^^" 
for any a > 1, set S := {j : pj > 2""*"°"} /za* a total probability no less than 1 — 1/a. 

Proof: We show that := {j : pj < 2^"''"°} cannot have a total weight strictly larger than 1/a. We 
know 

nbn > ^ -Pj logpj > ^ -pj logpj > ^ -pj log(2"°"^") = anbn ■ Pr(S"'). 

Thus the lemma is valid. □ 
We see that by making a to be a constant larger than one, the total weight of the colors with individual 

weight larger than 2^"'*"" is non-negligible. 
We now present the proof for Theorem [2j 
Proof: By Lemma |7} we know that for any p > I, 



Pr (^X" G {x" : i7(i"|x") < n/ia„}) > 1 - l/fi. 



Define Cf"^ := {x" : //(i^jx") < rifian}- For each x" in C[^\ by definition, we have Ylk=i ~Pk logPfc ^ 



nfxan, where pk := Pr{Z"^^ = fc|x"). Then by Lemma [8| we know for any a > 1 there exists a set of 
colors S such that: 1) Pr(Z" G ^jx") > 1 - and 2) For each color j G S", pj > 2""'^'^"°. 

For such an x" and color j, by Lemma [2| the generalized blowing-lemma, we know for any A > 1, 

Pr (i« G r„AVAi^(^f^)|X" = x") > 1 - 1/A. 

Now the theorem is proved by letting fx = a = \. □ 
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Appendix B 

A. Proof for Theorem^ 

By Theorem [ij for any A > 1, there exist a set of code words c|"'' and constant po := 1 — l/\ > 
such that: 

i) Pr(X" e > po; and 

ii) For each G Cj"'*, there exists a set of colors S{x^) such that Pr{Z" G ^(x")!^") > po, and for 
each color j G ^(x"), 

Pr (r« G r„,3/.^(^f ))|x") > Po, (9) 

where A'f^ := {z" G fi" : -z" = j}. 
Note we use Y" in (|9]) instead of because F" and are i.i.d. given X". In other words, in the ball 
of radius nX^^'^^/a^ around an independently drawn F", with probability at least pq one can find a point 
with color j, assuming the code word sent is from 

Based on this, the following procedure can be applied to decode X" solely based on F". Randomly 
and uniformly pick a point cj" in the ball centered at F". Assume its color is w". Apply the decoding 
function /„(u;", F") to map to a code word, announce it the codeword decoded. 

Now we calculate the decoding probability. By assumption, since the code book is feasible, we have 
Pr(/„(Z'^, F") ^ X") 0. So there exists an integer iVi > such that 

Pr(/„(i", F") ^ X") < pI/A, n > Xi. (10) 

We can also assume that at least half of the code words in C[^^ satisfies 

Pr(/„(Z",F") ^ X'^IX" = x") < po,^ > N,. (11) 

Denote these code words as Cg"''. 

For an x" G €2'^^ and a j G S'(x"), with n > Ni, 

Pr (/„(a;«,F")=X"|x",i"=j) 

> Pr (/„(a;",F") = X", = Z",F" G T^.a/.^^f ^)|x", = j) 
= Pr (/„(i",F") = X", = Z",F" G r„,3/.^Mf^)|x",Z" = j) 
= Pr F") = X", F" G r„,3/.^(^f ))|x", = j) 

■Pr (j" = Z« I F") = X", F" G r„,3/.^(^f ^), X" = X", = j) 

> Pr Vn - X^ F" G ) 1-"^ - • \Ballinl^^V^.)y 

where the last inequality is because one picks a point uniformly within the ball which contains a point 
with color Z". We also know 

Pr F") = X", F" G r„,3/.^(^f ))|x", i« = j) 

= Pr (f" G r„,3/.^(^f^)|x",j) - Pr (/„(i",F") ^ X",F" G r„,3/.^(^5"^)|x", j) 
> Pr(F" G r„,3/.^(^f))|x",j) - Pr(/„(Z",F") ^ X"|x^J) 
= Pr(F" G T^,s/.^{Af)\xn - PTUn{Z-,Y^') + X"|x",j), 
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where the last equality comes from the fact that F" is independent of given X", and is a function 
of Z". Because of (|9l), we thus know 



Pr (/„(a;^y") = x"|x^i" = J) 



(12) 



> J9o-Pr(/„(Z",r")^X"|x",j) 



\Ball{nX^I^^)\' 
Notice that (Recall ^(X") is the special color set of X") 

Pr fx'^ G C^"\ G 5(X")) = Pr(X" G C^"^) ■ Pr fz" G 5(X")|X" G Cj"^) > pl/2. 



Thus, combining with (12), we get 

Pr(/„(a;",F")=X") 

= PriX"" = x", i" = j) ■ Pr{fn{u^, r") = X"|X" = x", = j) 



a;", J 

> ^ Pr(X 



" - x", = j) ■ Pr(/„(w", F") = X"|X" = x", = j) 



> 



1 



(^Pr (^X" G C^"\ i" G 5(X")) -po-Pr F") ^ X",X" G C^''^ G ^(X"))) 



> 



> 



1 



1 



p^/2-Pr /„(Z-,r")^X" 



■ Po/4, 



where the last inequality is because of (10). 

By Arimoto's result we have \og\Ball{nX^/'^y/a^)\/n + 5„ > S{R), where (5„ is a function of n 
and A. □ 



B. Derive S{R) for the binary erasure channel 

For input distribution such that Pr(X = 0) = p, by definition we have 

= -log - eY/^^+f)f^'^ + ((1 - e)i/(i+^))^'+^^ + (^6^/(1+") + (1 -p)e^/(^+^y'+'^ 

= - log - e) + (1 - pY^+p\1 - e) + e] 

= -log[(p(i+'') + (l-p)(i+''))(l-e) + e] . 

It is easy to show by checking the sign of d^o(p,p) /dp that argminp$o(p,p) = 1/2, noting p < 0. And 
thus 

£{R) = max ( -pP + min $o(p,p) ) = max (-pP - log[2"''(l - e) + el) 
pe[-i. 0) \ p J pe[-i, 0) 

= max (Px - log[2^(l - e) + e ) 

xe(o,i] 

=: max s(x). 

xe(o,i] 
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The derivative of s{x) is R— gl^iz^^- We observe that: 1) when x = 0, = i? — (1 — e) > 0, as 
R is larger than Cxy', and 2) s'{x) is mono tonic ally decreasing in x. For s'{x) to be zero, one must have 
2^ = • Hence we know that for Re (1 — e, 1 — 2^), s'{x) can reach 0. When R > 1 — 

s'{x) is always positive. In the latter case, we know £{R) = R — log(2 — e). 

In sum, we know 



£{R) 



^ log (T3|l37Q - log ( + e) , i? G ( 1 - e , 1 - ^ ; 
/?-log(2-e), i?> 1-^. 



Appendix C 
Proof for Theorem[9] 

We first need the following, which is almost the same as known result in [fTSl . 

Lemma 9. Suppose is a constant for each n, i.e., = i/q. Then for any 6 > 0, there exist simulation 
encoding and decoding with rate Ri = limsupi(X"; Z^\yQ)/n + 6 = limsup ^ log pi^z"\y") ^ 
R2 sufficiently large such that 

Proof: The theorem is the same as a special case in the achievability part in Section VI of (15]. As stated 
in Remark [3l there are two small differences between our formulation and that in [15]. First, the source 
X" is not generated from i.i.d. random variables Xi,X2, ■ ■ ■ ,X„ based on a distribution p{x). Instead, 
here the source is uniformly picked from an existing code book. Secondly, there is a side information 
in our formulation. 

Replacing the Lemma 6.1 in [fTSl with (our) Lemma |4] (which is copied from [fT4l| ) and making R2 
large, the same proof goes through]^ Note that the mutual information becomes the corresponding limsup 
expression. As a remark, for a given ?/q, the channel is a "conditioned" channel. The source distribution 
and channel distribution (for given input) may be different than without conditioning, but Lemma |4] still 
applies. This is because everything is conditioned on y^. □ 

Now we are ready to prove Theorem [9j 

Proof for Theorem |9]' We know that 

\p{x^y\z-) - p{x'^,y^)Q{z-\x^,y-)\ 

a: "',2/"', 2:"- 

E ip(a^",^i2/")-p(^iy")Q(^"k",i/")i 



y" 



One can apply the channel simulation procedure for each as indicated in Lemma |9j (The detailed 
simulation procedure is in Section VI of [,15].) This leads to a simulated distribution y'^)Q{z'^\x'^, y"). 

If Z]x"2" 2;"!?/") — p{x"'\y'^)Q{z'^\x'^,y'^)\ ^ 0, n — )• 00, then there exist a series of 

integers Uk going to infinity, positive constants ci and C2, and events An^. C f^y* such that: 

1) Pr(r"'= G A„J > ci > 0; 

2) for all G A^,, c/(X"^ = y"*) > cs. 
By Lemma [9], we must have 

lim sup i (X"" ; | = )/n>Ri, "iy""^ G . (13) 

* Actually Lemma 6.1 in L15J is a simplified version of Lemma |4] 
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On the other hand, since Ri = Umsupi(X"; + S, by definition we have 

(1 v(X"' Z"'\Y"') \ 

Because 

we know that the set of y"'s such that limsupi(X"; = y'^)/n > Ri must have a probability going 



to zero, as n goes to infinity. This is a contradiction to (13) above! □ 



Appendix D 

A. Proof for Lemma |3| 

By definition, = /(X"; Y"). Since liminf F") > Cxy - Ci , for any S > 0, there 

exists a sequence {n^} and ps > such that 

Pr(z(X"^ < CxY -c, + 5)> PS. (14) 

Furthermore, for any n we have 

^(X";y")/n = -log^^ ' ^ 



1 , 

log 



< -log2"^ = i?. (15) 
n 

Also, since the channel XY satisfies the strong converse property, by Lemma 10 of [fT4l . we know 
there exists e^. — )■ such that 

Pr ^ < CxY + e J ^ 1, n^oo. (16) 



n 

Combining the above ([14]), ( [T5] ), ( [T6] ), we know 



I(^X'"';Y'"')/nk = Ei{X'"']Y'''')/nk 

< {CxY - ci + 5)ps + (Cxy + efc)(l - - o(l)) + i? ■ o(l) 
= C'xy-(ci-5)p5 + o(l). 

The lemma is proved by defining C2 = max5>o(ci — 5)ps- □ 
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B. Proof for Theorem [70| 

First focus on the relay channel. Denote a„ := i7(Z"|X")/n. For any n, we know H{X"') = nR and, 
by Fano's lemma, Z") = n ■ o(l). We have 

n{R + o{l)) = /(X";F",Z") 

= /(X"; r") + - < /(X"; y") + ni?o - nan. 



That is, i? < /(X"; y^)/?7, + i?o — <^n + By Lemma |6} we know there exits Uk going to infinity such 
that 

R<CxY-C2 + Ro-a^,+o{l). (17) 
Denoting a := limsupanj., we have 

R<CxY -C2 + R0- a. (18) 

This satisfies i) and iii). 

Now we show ii) by applying results from channel simulation for the companion channel p{y, z\x). 
One can simulate the companion channel p{y,z\x) as follows. By Lemma Bland Theorem |9| for any 



S > 0, with rate 

Ri:=CxYZ-CxY + Ci + S, (19) 

one can encode the channel z") based on side information F" and common randomness K, by 

applying 0„(X", F", i^). Given U, the channel decoder generates an output based on U, and K 
by function ipn{U, F", K). And we know that 

J2 2/^ - y") I -> 0, n ^ 00. 

In the relay channel, node Y can utilize this channel simulation to produce a with distribution 
close to that of the relay's observation as follows. To generate a channel simulation output it needs 
U,K,Y'^. It has ¥'"■ because it observes it directly. It has K because it is a common randomness - a 
random variable uniformly distributed on {1, 2, ■ ■ ■ , 2"^^ j por U, there are total 2"^^ possibilities. Node 
Y picks a U uniformly in {1,2, ■ ■ ■ ,2"^i}, and generates a based on 'ipn{U ,Y", K). Note that the 
probability to hit the correct U is 2~"^^ 

Assume that is the channel simulation output. Node Y can apply the same procedure and argument 
as in Section IV to guess X". Specifically, it draws a ball of radius nX^^'^y/a^ around in the space 
flz' for a constant A > 1. Then it picks a point tu"' uniformly in the ball and applies the known decoding 
function /„(a;",F") to guess X". 

Now we analyze the decoding probability. Suppose Zf is a random variable such that it is i.i.d. of 
conditioned on X". We know as in Section IV that, when X" is from a non-diminishing set of code 
words C["'\ the ball around Z^ of radius nA^/^y^ will contain a point with the same color as Z"'s with 
provability no less than a positive constant po. To be specific, assume = c and define 



rf^'^ ■= {y^ e fiy : There is y"^ with color c and c//^(?/", y^) < nX^/^^ } . 



— 

That is, r. ^ " is the blown-up set of the points with color c. 



We have 



Pr {Z^ G rf '''v^lX" e Cf^~) > po. 



However the above analysis is for a hypothetical Z^. What the channel simulation really generates is 
Z". For Z"-, because of ([6|), we know 

Pr {z- e rf^ix^ e cj")) = Pr [z^ e rf^ix^ e cj"^) + o(i) >po + o(i). 
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That is, the ball around of radius nA^/^^o^ still contains the color of with non-diminishing 
probability. 

Now we can bound the decoding probability as follows, as in Section IV. 

1 



Pr (Node Y can decode correctly) > 2 ■ {jji + o(l)) 



Ea//(nA3/2y^)|' 



where /^i > is a function of pq. In words, it is saying that one can decodes if the correct U is used and 
the correct color is hit in the ball. Based on the result of Arimoto [9], one must have 



£y{R) <Ri + limsup — log |5a//(nA^/^ys;;;)| . 



Recall Uk and Ri are defined in pJ) and ( [T9| ), respectively. Letting A go to one and then k go to infinity, 
we get 



Sy{R) < CxYz - CxY + ci + H2{V^) + V^log \Qz\ 



□ 



